In a Webmaster Hangout, Google’s John Mueller addressed whether plagiarized content could negatively impact a site’s rankings. Mueller’s response offered insight into how Google deals with sites that steal content and the potential effects on your site.
Scraper Sites and Ranking Impact
There are numerous bad actors who steal content for use on their own sites, often through automated software. This process is known as content scraping, and sites that publish stolen content are referred to as content scrapers.
Stolen content can sometimes lead to a loss of rankings in Google. It is not uncommon to search a snippet of your content and find another site ranking with it. The concern about the effect on rankings is a valid one.
Here’s the question:
"A few websites have started scraping my content and publishing it. We tried contacting their hosts for a DMCA takedown without success. Does having my content scraped and republished hurt my site? Should I disavow these URLs?"
What is DMCA?
The question references a DMCA takedown. DMCA refers to the Digital Millennium Copyright Act, an American law. This law protects hosts, domain registrars, and other businesses from liability for copyright violations as long as they provide a way for content creators to request takedowns of stolen content. It also includes due process provisions that allow the takedown to be contested, possibly leading to costly litigation for the content creator.
It’s somewhat surprising that the publisher’s attempt to use the DMCA was unsuccessful, which can happen if the web host or domain registrar is located outside the USA. Each country has its own legal remedies.
Does Copied Content Affect Rankings?
Google’s John Mueller explained how stolen content impacts rankings:
"From our point of view, other sites copying your content wouldn’t negatively affect your website. It’s a common situation for sites to copy content. If you’re not seeing those copies appearing in search for the queries you care about, it may not be a priority to focus on."
What Mueller said makes sense because scraper sites generally do not rank for important search queries. Is it possible for scrapers to rank for long-tail or non-competitive queries? It is possible with those queries.
Why Scrapers Rank for Snippets of Content
It is not unusual for a scraper site to rank for a snippet of stolen content, but there is a reason for that. Snippets of content are often seen as nonsensical. If another site ranks for a snippet, it is not because their actions have made your site less relevant, but because the search algorithm ranks pages differently for nonsensical phrases.
Google’s algorithm attempts to make sense of all search queries, an almost impossible task when there is no sense in the search query. When the snippet does make sense, Google may rank other sites higher for that query, but that is the algorithm focusing on "topics."
Google does not rank pages by matching keywords, so even if the search is for your snippet, it does not guarantee your site will rank first. The key point is that content thieves generally do not rank for vital search queries. Don’t be concerned if you see scraper sites outranking you for snippets. That does not indicate a loss of ranking strength due to stolen content.
How to Protect Against Scrapers?
WordPress Anti-bot Plugins
Numerous WordPress plugins offer protection against malicious scrapers.
WordFence
WordFence is a popular plugin that can be customized to block scrapers for a chosen duration. It sends email alerts when attacks occur, helping users adjust the plugin’s settings to block them quickly.
WordFence functions by monitoring visitor behavior, specifically the number of pages or types of pages being downloaded. This behavior triggers a firewall that blocks the bots. I use WordFence to block scrapers and hacker bots and am satisfied with its performance.
Blackhole Anti-bot WordPress Plugin
Another popular WordPress plugin is Blackhole, which also has a feature-rich and reasonably priced Pro version. Blackhole uses a honeypot principle; legitimate bots will avoid crawling a prohibited link, while bad bots will not.
Blackhole sets a trap by including a link to the honeypot. When a bad bot follows this link, it is blocked from further crawling. All search engines are whitelisted, ensuring no legitimate search engine is blocked, even if Google follows the link.
Blackhole PHP
A PHP bot blocker called Blackhole can be installed on any server using PHP, making it compatible with forum software like Xenforo or phpBB.
reCAPTCHA Enterprise Beta
Google recently announced a free beta trial of reCAPTCHA Enterprise, a cloud service designed to block automated scrapers, hackers, and other malicious bots. Google offering a solution to bad bots illustrates the importance of blocking automated bot software, including scrapers.
Should You Protect Against Scrapers?
It is wise to protect your site from automated bots. Bots often crawl at night, concurrently with Google and other legitimate bots, which can become problematic if too many malicious bots are probing your site and slowing down your server. This can cause your server to send error responses to Google, hindering crawls and indexing.
Though John Mueller is correct that stolen content does not affect your rankings, it is still advisable to protect against scrapers to ensure Google can effectively crawl and index your site.
The Takeaway
The critical point is that Google confirmed scraped content does not affect your rankings.